No-regret algorithms for structured prediction problems—DRAFT
نویسنده
چکیده
No-regret algorithms are a popular class of learning rules which map a sequence of input vectors x1, x2 . . . to a sequence of predictions y1, y2, . . .. Unfortunately, most no-regret algorithms assume that the predictions yt are chosen from a small, discrete set. We consider instead prediction problems where yt has internal structure: yt might be a strategy in a game like poker, or a configuration of a data structure like a rebalancing binary search tree. We derive a family of no-regret learning rules, called Lagrangian Hedging algorithms, to take advantage of this structure. Our algorithms are a direct generalization of known no-regret learning rules like weighted majority and regret matching. In addition to proving regret bounds, we demonstrate one of our algorithms learning to play one-card poker.
منابع مشابه
No-regret algorithms for structured prediction problems
No-regret algorithms are a popular class of online learning rules. Unfortunately, most no-regret algorithms assume that the set Y of allowable hypotheses is small and discrete. We consider instead prediction problems where Y has internal structure: Y might be the set of strategies in a game like poker, the set of paths in a graph, or the set of configurations of a data structure like a rebalanc...
متن کامل(Online) Subgradient Methods for Structured Prediction
Promising approaches to structured learning problems have recently been developed in the maximum margin framework. Unfortunately, algorithms that are computationally and memory efficient enough to solve large scale problems have lagged behind. We propose using simple subgradient-based techniques for optimizing a regularized risk formulation of these problems in both online and batch settings, a...
متن کاملFollowing the Perturbed Leader for Online Structured Learning
We investigate a new Follow the Perturbed Leader (FTPL) algorithm for online structured prediction problems. We show a regret bound which is comparable to the state of the art of FTPL algorithms and is comparable with the best possible regret in some cases. To better understand FTPL algorithms for online structured learning, we present a lower bound on the regret for a large and natural class o...
متن کاملReinforcement and Imitation Learning via Interactive No-Regret Learning
Recent work has demonstrated that problems– particularly imitation learning and structured prediction– where a learner’s predictions influence the inputdistribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of acti...
متن کاملA Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches (Daumé III et al., 2009; Ross and Bagnell, 2010) provide stronger guarantees in this setting, but remain somewhat u...
متن کامل